107 research outputs found
Approximating Dynamic Time Warping and Edit Distance for a Pair of Point Sequences
We give the first subquadratic-time approximation schemes for dynamic time
warping (DTW) and edit distance (ED) of several natural families of point
sequences in , for any fixed . In particular, our
algorithms compute -approximations of DTW and ED in time
near-linear for point sequences drawn from k-packed or k-bounded curves, and
subquadratic for backbone sequences. Roughly speaking, a curve is
-packed if the length of its intersection with any ball of radius
is at most , and a curve is -bounded if the sub-curve
between two curve points does not go too far from the two points compared to
the distance between the two points. In backbone sequences, consecutive points
are spaced at approximately equal distances apart, and no two points lie very
close together. Recent results suggest that a subquadratic algorithm for DTW or
ED is unlikely for an arbitrary pair of point sequences even for . Our
algorithms work by constructing a small set of rectangular regions that cover
the entries of the dynamic programming table commonly used for these distance
measures. The weights of entries inside each rectangle are roughly the same, so
we are able to use efficient procedures to approximately compute the cheapest
paths through these rectangles
TempME: Towards the Explainability of Temporal Graph Neural Networks via Motif Discovery
Temporal graphs are widely used to model dynamic systems with time-varying
interactions. In real-world scenarios, the underlying mechanisms of generating
future interactions in dynamic systems are typically governed by a set of
recurring substructures within the graph, known as temporal motifs. Despite the
success and prevalence of current temporal graph neural networks (TGNN), it
remains uncertain which temporal motifs are recognized as the significant
indications that trigger a certain prediction from the model, which is a
critical challenge for advancing the explainability and trustworthiness of
current TGNNs. To address this challenge, we propose a novel approach, called
Temporal Motifs Explainer (TempME), which uncovers the most pivotal temporal
motifs guiding the prediction of TGNNs. Derived from the information bottleneck
principle, TempME extracts the most interaction-related motifs while minimizing
the amount of contained information to preserve the sparsity and succinctness
of the explanation. Events in the explanations generated by TempME are verified
to be more spatiotemporally correlated than those of existing approaches,
providing more understandable insights. Extensive experiments validate the
superiority of TempME, with up to 8.21% increase in terms of explanation
accuracy across six real-world datasets and up to 22.96% increase in boosting
the prediction Average Precision of current TGNNs.Comment: Accepted at NeurIPS 2023, Camera Ready Versio
Graph Convolutional Neural Networks for Web-Scale Recommender Systems
Recent advancements in deep neural networks for graph-structured data have
led to state-of-the-art performance on recommender system benchmarks. However,
making these methods practical and scalable to web-scale recommendation tasks
with billions of items and hundreds of millions of users remains a challenge.
Here we describe a large-scale deep recommendation engine that we developed and
deployed at Pinterest. We develop a data-efficient Graph Convolutional Network
(GCN) algorithm PinSage, which combines efficient random walks and graph
convolutions to generate embeddings of nodes (i.e., items) that incorporate
both graph structure as well as node feature information. Compared to prior GCN
approaches, we develop a novel method based on highly efficient random walks to
structure the convolutions and design a novel training strategy that relies on
harder-and-harder training examples to improve robustness and convergence of
the model. We also develop an efficient MapReduce model inference algorithm to
generate embeddings using a trained model. We deploy PinSage at Pinterest and
train it on 7.5 billion examples on a graph with 3 billion nodes representing
pins and boards, and 18 billion edges. According to offline metrics, user
studies and A/B tests, PinSage generates higher-quality recommendations than
comparable deep learning and graph-based alternatives. To our knowledge, this
is the largest application of deep graph embeddings to date and paves the way
for a new generation of web-scale recommender systems based on graph
convolutional architectures.Comment: KDD 201
Learning to Group Auxiliary Datasets for Molecule
The limited availability of annotations in small molecule datasets presents a
challenge to machine learning models. To address this, one common strategy is
to collaborate with additional auxiliary datasets. However, having more data
does not always guarantee improvements. Negative transfer can occur when the
knowledge in the target dataset differs or contradicts that of the auxiliary
molecule datasets. In light of this, identifying the auxiliary molecule
datasets that can benefit the target dataset when jointly trained remains a
critical and unresolved problem. Through an empirical analysis, we observe that
combining graph structure similarity and task similarity can serve as a more
reliable indicator for identifying high-affinity auxiliary datasets. Motivated
by this insight, we propose MolGroup, which separates the dataset affinity into
task and structure affinity to predict the potential benefits of each auxiliary
molecule dataset. MolGroup achieves this by utilizing a routing mechanism
optimized through a bi-level optimization framework. Empowered by the meta
gradient, the routing mechanism is optimized toward maximizing the target
dataset's performance and quantifies the affinity as the gating score. As a
result, MolGroup is capable of predicting the optimal combination of auxiliary
datasets for each target dataset. Our extensive experiments demonstrate the
efficiency and effectiveness of MolGroup, showing an average improvement of
4.41%/3.47% for GIN/Graphormer trained with the group of molecule datasets
selected by MolGroup on 11 target molecule datasets
- …